Tesfaldet M, Brubaker M A, Derpanis K G. Two-stream convolutional networks for dynamic texture synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6703-6712.

1. Overview

1.1. Motivation

Two-stream hypothesis model the human visual cortex in terms of two pathways

ventral stream. involved with object recognition
dorsal stream. involved with motion processing

In this paper, it proposed two-stream model for dynamic texture synthesis

object recognition (appearance). encapsulate the per-frame appearance, pre-trained for object recognition
optical flow prediction (dynamic). model dynamics, pre-trained for optical flow prediction

combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures
first work to demonstrate this form for style transfer

Two General Approaches
- non-parametric sampling
- statistical parametric model
Gram Matrix
- capture the style information, ignore the spatial location
- [b, c, h, w]→ [b, c, hw] & [b, hw, c]→ [b, c, c]

1.3. Future Work

extent the idea of a factorized representation into feed-forward generative networks

2. Method

Synthesizing a dynamic texture is formulated as an optimization problem with the objective of matching the activation statistics.

2.1. Appearance Stream

N_l. the number of filter
M_l. the number of spatial location
t. time t

Average over the target frames (as groud-truth).
- T. the number of target frames
- k. spatial location
- i, j. the index of filter

each single frame to be synthesised (as prediction).

The Loss Function
L_{app}. the number of layers used to compute Gram Matrices
T_{out}. the number of frames being generated in the output

2.2. Dynamic Stream

input. a pair of consecutive greyscale images
T-1. T frames group into (T-1) pairs

The Loss Function

2.3. Overall

memory increases as the frames grows
separate the sequence into sub-sequence
initialize the first frame of a sub-sequence as the last frame from the previous sub-sequence and keep it fixed.

3. Experiments

3.1. w/o Dynamic Stream

3.2. Loss of Flow Decode Layer vs Concat Layer

concatenation layer activation is far more effective than the flow decode layer

3.3. Failure Example

fail to capture spatially-inconsistent dynamics
fail to capture textures with spatially-variant appearance